A distributed chunk calculation approach for self-scheduling of parallel applications on distributed-memory systems
نویسندگان
چکیده
Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) assign variably-sized chunks iterations. We present distributed (DCA) that supports various types DLS techniques. Using both CCA DCA, twelve implemented evaluated in different CPU slowdown scenarios. The results show the using DCA outperform their corresponding ones with CCA, especially extreme system
منابع مشابه
Runtime Incremental Parallel Scheduling (RIPS) on Distributed Memory Computers - Parallel and Distributed Systems, IEEE Transactions on
Runtime Incremental Parallel Scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. In this scheduling strategy, the system scheduling activity alternates with the underlying computation work. RIPS utilizes the advanced parallel scheduling technique to produce a low-overhead, high-quality load balancing, as well as adapting to irregular applications. This paper pr...
متن کاملChunk: A Framework for Modular Distributed Shared Memory Systems
We present Chunk, a framework for building modular distributed shared memory systems for UNIX. Chunk allows applications that are designed to share local memory through the UNIX memory mapped file mechanism (mmap) to be able to share memory across different physical hosts without modifications. Chunk’s modular architecture enables the use of a variety of memory-sharing policies. We present a DS...
متن کاملParallel Loop Scheduling Approaches for Distributed and Shared Memory Systems
In this paper, we propose different approaches for the parallel loop scheduling problem on distributed as well as shared memory systems. Specifically, we propose adaptive loop scheduling models in order to achieve load balancing, low runtime scheduling, low synchronization overhead and low communication overhead. Our models are based on an adaptive determination of the chunk size and an exploit...
متن کاملAdaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real work...
متن کاملScalability of Finite Element Applications on Distributed-memory Parallel Computers Scalability of Finite Element Applications on Distributed-memory Parallel Computers
This paper demonstrates that scalability and competitive eeciency can be achieved for unstructured grid nite element applications on distributed memory machines, such as the Connection Machine CM-5 system. The eeciency of nite element solvers is analyzed through two applications: an implicit computational aerodynamics application and an explicit solid mechanics application. Scalability of mesh ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computational Science
سال: 2021
ISSN: ['1877-7511', '1877-7503']
DOI: https://doi.org/10.1016/j.jocs.2020.101284